{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数组读写"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 从文本中读取数组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 空格（制表符）分割的文本"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设我们有这样的一个空白分割的文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing myfile.txt\n"
     ]
    }
   ],
   "source": [
    "%%writefile myfile.txt\n",
    "2.1 2.3 3.2 1.3 3.1\n",
    "6.1 3.1 4.2 2.3 1.8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了生成数组，我们首先将数据转化成一个列表组成的列表，再将这个列表转换为数组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data = []\n",
    "\n",
    "with open('myfile.txt') as f:\n",
    "    # 每次读一行\n",
    "    for line in f:\n",
    "        fileds = line.split()\n",
    "        row_data = [float(x) for x in fileds]\n",
    "        data.append(row_data)\n",
    "\n",
    "data = np.array(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 2.1,  2.3,  3.2,  1.3,  3.1],\n",
       "       [ 6.1,  3.1,  4.2,  2.3,  1.8]])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "不过，更简便的是使用 `loadtxt` 方法："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 2.1,  2.3,  3.2,  1.3,  3.1],\n",
       "       [ 6.1,  3.1,  4.2,  2.3,  1.8]])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = np.loadtxt('myfile.txt')\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 逗号分隔文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting myfile.txt\n"
     ]
    }
   ],
   "source": [
    "%%writefile myfile.txt\n",
    "2.1, 2.3, 3.2, 1.3, 3.1\n",
    "6.1, 3.1, 4.2, 2.3, 1.8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于逗号分隔的文件（通常为`.csv`格式）,我们可以稍微修改之前繁琐的过程，将 `split` 的参数变成 `','`即可。\n",
    "\n",
    "不过，`loadtxt` 函数也可以读这样的文件，只需要制定分割符的参数即可："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 2.1,  2.3,  3.2,  1.3,  3.1],\n",
       "       [ 6.1,  3.1,  4.2,  2.3,  1.8]])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = np.loadtxt('myfile.txt', delimiter=',')\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### loadtxt 函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "    loadtxt(fname, dtype=<type 'float'>, \n",
    "            comments='#', delimiter=None, \n",
    "            converters=None, skiprows=0, \n",
    "            usecols=None, unpack=False, ndmin=0)\n",
    "\n",
    "`loadtxt` 有很多可选参数，其中 `delimiter` 就是刚才用到的分隔符参数。\n",
    "\n",
    "`skiprows` 参数表示忽略开头的行数，可以用来读写含有标题的文本"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting myfile.txt\n"
     ]
    }
   ],
   "source": [
    "%%writefile myfile.txt\n",
    "X Y Z MAG ANG\n",
    "2.1 2.3 3.2 1.3 3.1\n",
    "6.1 3.1 4.2 2.3 1.8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 2.1,  2.3,  3.2,  1.3,  3.1],\n",
       "       [ 6.1,  3.1,  4.2,  2.3,  1.8]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.loadtxt('myfile.txt', skiprows=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此外，有一个功能更为全面的 `genfromtxt` 函数，能处理更多的情况，但相应的速度和效率会慢一些。\n",
    "\n",
    "    genfromtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, \n",
    "               skiprows=0, skip_header=0, skip_footer=0, converters=None, \n",
    "               missing='', missing_values=None, filling_values=None, usecols=None, \n",
    "               names=None, excludelist=None, deletechars=None, replace_space='_', \n",
    "               autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, \n",
    "               usemask=False, loose=True, invalid_raise=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### loadtxt 的更多特性"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于这样一个文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting myfile.txt\n"
     ]
    }
   ],
   "source": [
    "%%writefile myfile.txt\n",
    " -- BEGINNING OF THE FILE\n",
    "% Day, Month, Year, Skip, Power\n",
    "01, 01, 2000, x876, 13 % wow!\n",
    "% we don't want have Jan 03rd\n",
    "04, 01, 2000, xfed, 55"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[   1,    1, 2000,   13],\n",
       "       [   4,    1, 2000,   55]])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = np.loadtxt('myfile.txt', \n",
    "                  skiprows=1,         #忽略第一行\n",
    "                  dtype=np.int,      #数组类型\n",
    "                  delimiter=',',     #逗号分割\n",
    "                  usecols=(0,1,2,4), #指定使用哪几列数据\n",
    "                  comments='%'       #百分号为注释符\n",
    "                 )\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### loadtxt 自定义转换方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting myfile.txt\n"
     ]
    }
   ],
   "source": [
    "%%writefile myfile.txt\n",
    "2010-01-01 2.3 3.2\n",
    "2011-01-01 6.1 3.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "假设我们的文本包含日期，我们可以使用 `datetime` 在 `loadtxt` 中处理："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[datetime.datetime(2010, 1, 1, 0, 0), 2.3, 3.2],\n",
       "       [datetime.datetime(2011, 1, 1, 0, 0), 6.1, 3.1]], dtype=object)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import datetime\n",
    "\n",
    "def date_converter(s):\n",
    "    return datetime.datetime.strptime(s, \"%Y-%m-%d\")\n",
    "\n",
    "data = np.loadtxt('myfile.txt',\n",
    "                  dtype=np.object, #数据类型为对象\n",
    "                  converters={0:date_converter,  #第一列使用自定义转换方法\n",
    "                              1:float,           #第二第三使用浮点数转换\n",
    "                              2:float})\n",
    "\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "移除 `myfile.txt`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import os\n",
    "os.remove('myfile.txt')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 读写各种格式的文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如下表所示：\n",
    "\n",
    "文件格式|使用的包|函数\n",
    "----|----|----\n",
    "txt | numpy | loadtxt, genfromtxt, fromfile, savetxt, tofile\n",
    "csv | csv | reader, writer\n",
    "Matlab | scipy.io | loadmat, savemat\n",
    "hdf | pytables, h5py| \n",
    "NetCDF | netCDF4, scipy.io.netcdf | netCDF4.Dataset, scipy.io.netcdf.netcdf_file\n",
    "**文件格式**|**使用的包**|**备注**\n",
    "wav | scipy.io.wavfile | 音频文件\n",
    "jpeg,png,...| PIL, scipy.misc.pilutil | 图像文件\n",
    "fits | pyfits | 天文图像\n",
    "\n",
    "此外， `pandas` ——一个用来处理时间序列的包中包含处理各种文件的方法，具体可参见它的文档：\n",
    "\n",
    "http://pandas.pydata.org/pandas-docs/stable/io.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 将数组写入文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`savetxt` 可以将数组写入文件，默认使用科学计数法的形式保存："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "data = np.array([[1,2], \n",
    "                 [3,4]])\n",
    "\n",
    "np.savetxt('out.txt', data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.000000000000000000e+00 2.000000000000000000e+00\n",
      "3.000000000000000000e+00 4.000000000000000000e+00\n"
     ]
    }
   ],
   "source": [
    "with open('out.txt') as f:\n",
    "    for line in f:\n",
    "        print line,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "也可以使用类似**C**语言中 `printf` 的方式指定输出的格式："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data = np.array([[1,2], \n",
    "                 [3,4]])\n",
    "\n",
    "np.savetxt('out.txt', data, fmt=\"%d\") #保存为整数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 2\n",
      "3 4\n"
     ]
    }
   ],
   "source": [
    "with open('out.txt') as f:\n",
    "    for line in f:\n",
    "        print line,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "逗号分隔的输出："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data = np.array([[1,2], \n",
    "                 [3,4]])\n",
    "\n",
    "np.savetxt('out.txt', data, fmt=\"%.2f\", delimiter=',') #保存为2位小数的浮点数，用逗号分隔"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.00,2.00\n",
      "3.00,4.00\n"
     ]
    }
   ],
   "source": [
    "with open('out.txt') as f:\n",
    "    for line in f:\n",
    "        print line,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "复数值默认会加上括号："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "data = np.array([[1+1j,2], \n",
    "                 [3,4]])\n",
    "\n",
    "np.savetxt('out.txt', data, fmt=\"%.2f\", delimiter=',') #保存为2位小数的浮点数，用逗号分隔"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " (1.00+1.00j), (2.00+0.00j)\n",
      " (3.00+0.00j), (4.00+0.00j)\n"
     ]
    }
   ],
   "source": [
    "with open('out.txt') as f:\n",
    "    for line in f:\n",
    "        print line,"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "更多参数："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "    savetxt(fname, \n",
    "            X, \n",
    "            fmt='%.18e', \n",
    "            delimiter=' ', \n",
    "            newline='\\n', \n",
    "            header='', \n",
    "            footer='', \n",
    "            comments='# ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "移除 `out.txt`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "os.remove('out.txt')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numpy 二进制格式"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数组可以储存成二进制格式，单个的数组保存为 `.npy` 格式，多个数组保存为多个`.npy`文件组成的 `.npz` 格式，每个 `.npy` 文件包含一个数组。\n",
    "\n",
    "与文本格式不同，二进制格式保存了数组的 `shape, dtype` 信息，以便完全重构出保存的数组。\n",
    "\n",
    "保存的方法：\n",
    "\n",
    "- `save(file, arr)` 保存单个数组，`.npy` 格式\n",
    "- `savez(file, *args, **kwds)` 保存多个数组，无压缩的 `.npz` 格式\n",
    "- `savez_compressed(file, *args, **kwds)` 保存多个数组，有压缩的 `.npz` 格式\n",
    "\n",
    "读取的方法：\n",
    "\n",
    "- `load(file, mmap_mode=None)` 对于 `.npy`，返回保存的数组，对于 `.npz`，返回一个名称-数组对组成的字典。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 单个数组的读写"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "a = np.array([[1.0,2.0], [3.0,4.0]])\n",
    "\n",
    "fname = 'afile.npy'\n",
    "np.save(fname, a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1.,  2.],\n",
       "       [ 3.,  4.]])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "aa = np.load(fname)\n",
    "aa"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "删除生成的文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "os.remove('afile.npy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 二进制与文本大小比较"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "a = np.arange(10000.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "保存为文本："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "np.savetxt('a.txt', a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看大小："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "260000L"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "os.stat('a.txt').st_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "保存为二进制："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "np.save('a.npy', a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看大小："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "80080L"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "os.stat('a.npy').st_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "删除生成的文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "os.remove('a.npy')\n",
    "os.remove('a.txt')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到，二进制文件大约是文本文件的三分之一。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 保存多个数组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "a = np.array([[1.0,2.0], \n",
    "              [3.0,4.0]])\n",
    "b = np.arange(1000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "保存多个数组："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "np.savez('data.npz', a=a, b=b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "查看里面包含的文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Archive:  data.npz\n",
      "  Length      Date    Time    Name\n",
      "---------  ---------- -----   ----\n",
      "      112  2015/08/10 00:46   a.npy\n",
      "     4080  2015/08/10 00:46   b.npy\n",
      "---------                     -------\n",
      "     4192                     2 files\n"
     ]
    }
   ],
   "source": [
    "!unzip -l data.npz"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "载入数据："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "data = np.load('data.npz')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "载入后可以像字典一样进行操作："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['a', 'b']"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1.,  2.],\n",
       "       [ 3.,  4.]])"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['a']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1000L,)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['b'].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "删除文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 要先删除 data，否则删除时会报错\n",
    "del data\n",
    "\n",
    "os.remove('data.npz')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 压缩文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当数据比较整齐时："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "a = np.arange(20000.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "无压缩大小："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "160188L"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.savez('a.npz', a=a)\n",
    "os.stat('a.npz').st_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有压缩大小："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "26885L"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.savez_compressed('a2.npz', a=a)\n",
    "os.stat('a2.npz').st_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "大约有 6x 的压缩效果。\n",
    "\n",
    "当数据比较混乱时："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "a = np.random.rand(20000.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "无压缩大小："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "160188L"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.savez('a.npz', a=a)\n",
    "os.stat('a.npz').st_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "有压缩大小："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "151105L"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.savez_compressed('a2.npz', a=a)\n",
    "os.stat('a2.npz').st_size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "只有大约 1.06x 的压缩效果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "os.remove('a.npz')\n",
    "os.remove('a2.npz')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
